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Abstract 

Elicitation is the study of statistics or properties which are computable via empirical risk minimization. 
While several recent papers have approached the general question of which properties are elicitable, we 
suggest that this is the wrong question—all properties are elicitable by first eliciting the entire distribution 
or data set, and thus the important question is how elicitable. Specifically, what is the minimum number 
of regression parameters needed to compute the property? 

Building on previous work, we introduce a new notion of elicitation complexity and lay the foundations 
for a calculus of elicitation. We establish several general results and techniques for proving upper and 
lower bounds on elicitation complexity. These results provide tight bounds for eliciting the Bayes risk 
of any loss, a large class of properties which includes spectral risk measures and several new properties 
of interest. Finally, we extend our calculus to conditionally elicitable properties, which are elicitable 
conditioned on knowing the value of another property, giving a necessary condition for the elicitability 
of both properties together. 


1 Introduction 

Empirical risk minimization (ERM) is a domininant framework for supervised machine learning, and a key 
component of many learning algorithms. A statistic or property is simply a functional assigning a vector 
of values to each distribution. We say that such a property is elicitable, if for some loss function it can be 
represented as the unique minimizer of the expected loss under the distribution. Thus, the study of which 
properties are elicitable can be viewed as the study of which statistics are computable via ERM [1, 2, 3]. 

The study of property elicitation began in statistics [4, 5, 6, 7], and is gaining momentum in machine 
learning [8, 1, 2, 3], economics [9, 10], and most recently, finance [11, 12, 13, 14, 15]. A sequence of 
papers starting with Savage [4] has looked at the full characterization of losses which elicit the mean of 
a distribution, or more generally the expectation of a vector-valued random variable [16, 3]. The case of 
real-valued properties is also now well in hand [9, 1]. The general vector-valued case is still generally open, 
with recent progress in [3, 2, 15]. Recently, a parallel thread of research has been underway in finance, to 
understand which financial risk measures, among several in use or proposed to help regulate the risks of 
financial institutions, are computable via regression, i.e., elicitable (cf. references above). More often than 
not, these papers have concluded that most risk measures under consideration are not elicitable, notable 
exceptions being generalized quantiles (e.g. value-at-risk, expectiles) and expected utility [13, 12]. 

Throughout the growing momentum of the study of elicitation, one question has been central: which 
properties are elicitable? It is clear, however, that all properties are elicitable if one first elicits the distribution 
using a standard proper scoring rule. Therefore, in the present work, we suggest replacing this question with a 
more nuanced one: how elicitable are various properties? Specifically, heeding the suggestion of Gneiting [7], 
we adapt to our setting the notion of elicitation complexity introduced by Lambert et al. [17], which captures 
how many parameters one needs to maintain in an ERM procedure for the property in question. Indeed, 
if a real-valued property is found not to be elicitable, such as the variance, one should not abandon it, but 
rather ask how much effort is required to compute it via ERM. 

Our work is heavily inspired by the recent progress along these lines of Fissler and Ziegel [15], who show 
that spectral risk measures of support k have elicitation complexity at most k+l. Spectral risk measures are 
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among those under consideration in the finance community, and this result shows that while not elicitable in 
the classical sense, their elicitation complexity is still low, and hence one can develop reasonable regression 
procedures for them. Our results extend to these and many other risk measures (see § 3.1.6), often providing 
matching lower bounds on the complexity as well. 

Our contributions are the following. We first introduce an adapted definition of elicitation complexity 
which we believe to be the right notion to focus on going forward. We establish a few simple but useful 
results which allow for a kind of calculus of elicitation; for example, conditions under which the complexity 
of eliciting two properties in tandem is the sum of their individual complexities. In § 3, we derive several 
techniques for proving both upper and lower bounds on elicitation complexity which apply primarily to 
the Bayes risks from decision theory, or optimal expected loss functions. The class includes spectral risk 
measures among several others; see § 3.1. Finally, in § 4 we turn to the case of conditional elicitation, where 
a property is elicitable if the correct value of another property is known. Complementing the case of the 
Bayes risk, which is conditionally elicitable, we give a necessary condition for the elicitability of a property 
together with its conditionee. We conclude with brief remarks and open questions. 

2 Preliminaries and Foundation 

Let n be a set of outcomes and V C A(fl) be a convex set of probability measures. The goal of elicitation 
is to learn something about the distribution p G V, specifically some function r(p), by minimizing a loss 
function. 

Definition 1. A property is a function T : P —> for some k G N, which associates a correct report 
value to each distribution} We let Tr = {p G V \ r = T{p)} denote the set of distributions p corresponding 
to report value r. 

Given a property T, we want to ensure that the best result is to reveal the value of the property using a 
loss function that evaluates the report using a sample from the distribution. 

Definition 2. A loss function L : x 17 —>■ R elicits a property T : P ^ R^ if for all p G V, r(p) = 

argsup^ L(r,p), where L{r,p) = Ep[L(r, •)]. A property is elicitable if some loss elicits it. 

A well-known necessary condition for elicitability is convexity of the level sets of F. 

Proposition 1 (Osband [5]). IfT is elicitable, the level sets F^ are convex for all r G r(P). 

It is often useful to work with a stronger condition, that not only is F^ convex, but it is the intersection 
of a subspace with V. This condition is equivalent the existence of an identification function, a functional 
describing the level sets of F [17, 1]. 

Definition 3. A function V : TZ x fl ^ is an identification function for F : P — > R^, or identifies 
F, if for all r G r(P) it holds that p G Tr V{r,p) = 0 € R^, where as with L{r,p) above we write 

V(r,p) =Ep[F(r, w)]. F is identifiable if there exists aV identifying it. 

We can now define the classes of identifiable and elicitable properties, along with the complexity of 
identifying or eliciting a given property. 

Definition 4. Let Ifc(P) denote the class of all identifiable properties F : P — >■ R^', and EkifP) denote the 
class of all elicitable properties F : P —>■ R^. We write P(P) = UfeeN^fe(^) = UfeeN^fe(^)- 

Definition 5. A property F is fc-identifiable if there exists F G Ifc(P) and f such that F = / o F. The 
identification complexity o/F is defined as iden(r) = min{fc : F is k-identifiable}. 

Definition 6. A property F is fc-elicitable if there exists F G ffc(P) and f such that F = /oF. The elicitation 
complexity o/F is defined as elic(r) = minjfc : F is k-elicitable}. 

Similarly, F is k-elicitable with respect to a class of properties C ifV G ffc(P) ClC in the above, and elicc(r) 
is the corresponding elicitation complexity o/F with respect to C. In particular we will often use elicx(r), 
where F must be both elicitable and identifiable. 

^We will also consider F : "P —> R^. 
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Our definition of elicitation complexity differs from the notion proposed by Lambert et al. [17], in that 
the components of F above do not need to be individually elicitable. This turns out to have a large impact, 
as under their definition the property r(p) = maxt^gopdw}) for finite has elicitation complexity |fl| — 1, 
whereas under our definition elicx(r) = 2; see Example 3.1.3. Fissler and Ziegel [15] propose a closer but still 
different definition, with the complexity being the smallest k such that F is a component of a fc-dimensional 
elicitable property. Again, this definition can lead to larger complexities than necessary; take for example 
the squared mean F(p) = Ep[a;]^ when = M, which has elicx(F) = 1 with r(p) = Ep[a;] and f{x) = 
but is not elicitable and thus has complexity 2 under [15]. We believe that, modulo regularity assumptions 
on £k{V), our definition is better suited to studying the difficulty of eliciting properties: viewing / as a 
(potentially dimension-reducing) link function, our definition captures the minimum number of parameters 
needed in an ERM computation of the property in question, followed by a simple one-time application of /. 

As noted, we will restrict our attention to elicx for much of the paper, which effectively requires elicx(F) > 
iden(F); specifically, if F is derived from some elicitable F, then F must be identifiable as well. This restriction 
is only relevant for our lower bounds, as our upper bounds give losses explicitly.^ Note however that some 
restriction on £k(V) is necessary, as otherwise pathological constructions giving injective mappings from R to 
would render all properties 1-elicitable. To alleviate this issue, some authors require continuity (e.g. [1]) 
while others like we do require identifiability (e.g. [15]), which can be motivated by the fact that for any 
differentiable loss L for F, V{r,uj) = VrL{-,uj) will identify F provided Ep[I/] has no inflection points or local 
minima. An important future direction is to relax this identifiability assumption, as there are very natural 
(set-valued) properties with iden > elic.^ 

2.1 Foundations of Elicitation Complexity 

In the remainder of this section, we make some simple, but useful, observations about iden(F) and elicx(F). 
We have in fact already discussed one such observation: elicx(F) > iden(F). 

It is easy to create redundant properties in various ways. For example, given elicitable properties Fi and 
F 2 the property F = {Fi,F 2 ,Fi-|-F 2 } clearly contains redundant information. A concrete case is F = {mean 
squared, variance, 2nd moment}, which has elicx(F) = 2; see Example 3.1.1. The following definitions 
and lemma capture various aspects of a lack of such redundancy. All omitted proofs may be found in the 
appendix. 

Definition 7. Property F : 7^ —>• R^' m I{V) is of full rank if iden(F) = k. 

Definition 8. Properties F,F' G T(7^) are independent */iden({F, F'}) = iden(F) + iden(F'). 

Lemma 1. //F,F'g £{P) are full rank and independent, then elicx({F,F'j) = elicx(F) -|- elicx(F'). 

Returning to the discussion above, it is well-known that elicx(variance) = 2, yet F = {mean,variance} 
has elicx(F) = 2, so clearly the mean and variance are not both independent and full rank. (In fact, variance 
is not full rank.) However, the mean and second moment are. 

Clearly, whenever p € V can be uniquely determined by some number of elicitable parameters then the 
elicitation complexity of every property is at most that number. The following propositions give two notable 
applications of this observation.^ 

Proposition 2. When jOj = n, every property F has elicx(F) < n — 1. 

Proof. The probability distribution is determined by the probability of any n — 1 outcomes, and the proba¬ 
bility associated with a given outcome is both elicitable and identifiable. □ 

Proposition 3. When H = R,® every property F has elicx(F) < u) (countable). 

One well-studied class of properties are those where F is linear, i.e., the expectation of some vector-valued 
random variable. All such properties are elicitable and identifiable (cf. [4, 8, 3]), with elicx(F) < fc, but of 
course the complexity can be lower if F is not full rank. 

^Our main lower bound (Thm 2) merely requires F to have convex level sets, which is necessary by Prop. 1. 

^One may take for example Ffp) = argmax^p(Ai) for a finite measurable partition Ai,..., An of fl. 

^Note that these restrictions on ft may easily be placed on P instead; e.g. finite ft is equivalent to P having support on a 
finite subset of ft, or even being piecewise constant on some disjoint events. 

®Here and throughout, when ft = R*’ we assume the Borel cr-algebra. 
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Lemma 2. Let X : —s- be V-integrable and T{p) = Ep[X]. Then elicx(r) = dim(affhull(r(7^))), the 

dimension of the affine hull of the range ofT. 

Another important case is the when F consists of some number of distinct quantiles. Osband [5] essentially 
showed that quantiles are independent and of full rank, so their elicitation complexity is the number of 
quantiles being elicited. 

Lemma 3. Let = M and V be a class of probability measures with continuously differentiable and invertible 
CDFs F, which is sufficiently rich in the sense that for all xi,..., Xk € R., span({F“^(a;i),..., F~^{xk)}, F G 
V) = R^. Let Qa, denote the a-quantile function. Then if ai,... ,ak are all distinct, F = {qa^ ■ ■ ■ ho,s 

elicx(F) = k. 

The quantile example in particular allows us to see that all complexity classes, including u>, are occupied. 
In fact, our results to follow will show something stronger: even for real-valued properties F : 7^ —^ R, 
all classes are occupied; we give here the result that follows from our bounds on spectral risk measures in 
Example 3.1.4, but this holds for many other 7^; see e.g. Example 3.1.2. 

Proposition 4. Let V as in Lemma 3. Then for alZ fe G N there exists 7 : 7^ —>■ R with elicx( 7 ) = 

3 Eliciting the Bayes Risk 

In this section we prove two theorems that provide our main tools for proving upper and lower bounds 
respectively on elicitation complexity. Of course many properties are known to be elicitable, and the losses 
that elicit them provide such an upper bound for that case. We provide such a construction for properties that 
can be expressed as the pointwise minimum of an indexed set of functions. Interestingly, our construction 
does not elicit the minimum directly, but as a joint elicitation of the value and the function that realizes this 
value. The form (1) is that of a scoring rule for the linear property p n- Ep[Aia], except that here the index 
a itself is also elicited®. 

Theorem 1. Let {Xa : Lt -G- Rj^e^ be a set of V-integrable functions indexed by A C R^”. Then if 
infoEpjXa] is attained, the property 'y{p) = minaEpjXa] is {k + 1)-elicitable. In particular, 

L{{r,a),u;) = H{r)-\-h{r){Xa-r) (1) 

elicits p I—>• {( 7 (p),a) : Ep[Xa]=^{p)} for any strictly decreasing : R —>■ R+ with -^H = h. 

Proof. We will work with gains instead of losses, and show that S{{r,a),uj) = g[r) + dgfXa — r) elicits 
p {( 7 ( 71 ),a) : EpjXa] = 7 ( 7 *)} for 7 ( 71 ) = m.axa^p[Xa]. Here g is convex with strictly increasing and 
positive subgradient dg. For any fixed a, we have by the subgradient inequality, 

S{ir,a),p) = g{r) -\-dgriEp[Xa] - r) < g{Ep[Xa]) = S{{Ep[Xa],a),p) , 

and as dg is strictly increasing, g is strictly convex, so r = EpjXa] is the unique maximizer. Now letting 
S{a,p) = S'((Ep[A:Q],a), 7 i), we have 

argmax 5 '(a, 7 i) = argmax 5 (Ep[Xa]) = argmaxEpjXa] , 
a^A a^A a^A 

because g is strictly increasing. We now have 

argmax5'((r, a),p) = < (Ep[Xa], a) : a G argmaxEpjXa] > . □ 

L a^A J 

One natural way to get such an indexed set of functions is to take an arbitrary loss function L(r,uj), in 
which case this pointwise minimum corresponds to the Bayes risk, which is simply the minimum possible 
expected loss under some distribution p. 

® As we are focused on the complexity of elicitation, we have not tried to fully characterize all ways to elicit this joint property 
(or other properties we give explicit losses for). See Section 3.1.1 for an example where additional losses are possible. 
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Definition 9. Given loss function L : A x D, on some prediction set A, the Bayes risk of L is defined 
as L{p) := miaeAL{a,p). 

One illustration of the power of Theorem 1 is that the Bayes risk of a loss eliciting a fc-dimensional 
property is itself {k + l)-elicitable. 

Corollary 1. If L : x ^ M. is a loss function eliciting T : 7^ —>■ then the loss 

L{{r,a),uj) = L'{a,uj) + H{r) + h{r){L{a,uj) - r) (2) 

elicits {T, r}, where : R —>■ R+ is any positive strictly decreasing function, H{r) = h{x)dx, and L' is 
any surrogate loss eliciting F/ //F G Ik{V), elici(L) < fe + 1 . 

We now turn to our second theorem which provides lower bounds for the elicitation complexity of the 
Bayes risk. A first observation, which follows from standard convex analysis, is that L is concave, and thus it 
is unlikely to be elicitable directly, as the level sets of L are likely to be non-convex. To show a lower bound 
greater than 1, however, we will need much stronger techniques. In particular, while L must be concave, 
it may not be strictly so, thus enabling level sets which are potentially amenable to elicitation. In fact, L 
must be flat between any two distributions which share a minimizer. Crucial to our lower bound is the fact 
that whenever the minimizer of L differs between two distributions, L is essentially strictly concave between 
them. 

Lemma 4. Suppose loss L with Bayes risk L elicits F : 7^ —> R^. Then for any p,p' € V with F(p) F(p'), 
we have L{Xp + (I — X)p') > XL{p) + (1 — X)L{p') for all X G (0,1). 

With this lemma in hand we can prove our lower bound. The crucial insight is that an identification 
function for the Bayes Risk of a loss eliciting a property can, through a link, be used to identify that 
property. The construction from Corollary 1 increases the dimension of the elicitation by I, and our lower 
bound shows this is often necessary. However, it is not always, as in the case of linear properties the property 
value provides all the information required to compute the Bayes risk for some choices of proper loss; for 
example, dropping the y^ term from squared loss gives L{x,y) = x^ — 2xy and L,{p) = —]Ep[?/]^. Thus the 
theorem splits the lower bound into two cases. 

Theorem 2. IfT G SkiV) is elicited by loss L and has elici(F) = k, then the expected loss L : p i—>■ L(T(p),p) 
has elici(L) > k. Moreover, if there is a function f : R^ -A R such that L = f o T, then elici(L) = k; 
otherwise, elici(L) = fc + I. 

Proof. Let T € Si such that L = goT for some 5 : R^ —>■ R. 

We show by contradiction that for all p,p' G V, r(p) = r(p') implies F(p) = F(p'). Otherwise, we 
have p,p' with r(p) = r(p'), and thus L{p) = L{p'), but F(p) ^ F(p'). Lemma 4 would then give us some 
Pa = Ap + (1 — X)p' with L{p\) > L.{p)- But as the level sets Ff are convex by Prop. I, we would have 
F(pa) = F(p), which would imply L{p\) = Lip)- 

We now can conclude that there exists 7i : R^ —R^ such that F = h oT. But as F € Si, this implies 
elici(F) < £, so clearly we need i > k. Finally, ii £ = k we have L = gor = g o h~^ o F. The upper bounds 
follow from Corollary 1. □ 

3.1 Examples and Applications 

We now give several applications of our results. Several upper bounds are novel, as well as all lower bounds 
greater than 2. In the examples, unless we refer to Si explicitly we will assume H = R and write y G Si so 
that y p. In each setting, we also make several standard regularity assumptions which we suppress for 
ease of exposition — for example, for the variance and variantile we assume finite first and second moments 
(which span R^), and whenever we discuss quantiles we will assume that V is as in Lemma 3, though we will 
not require as much regularity for our upper bounds. 

^Note that one could easily lift the requirement that F be a function, and allow Ffp) to be the set of minimizers of the loss 
(cf. [18]). We will use this additional power in Example 3.1.4. 
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3.1.1 Variance 


It is well-known that the variance (J^{p) = Ep[(Ep[y] — y)'^] is not elicitable, as its level sets are not convex, 
a necessary condition by Prop. 1. Of course, one can recover as a link of the linear (and thus elicitable) 
property (Ep[y],Ep[y^]), and hence elicx(o'^) = 2. To warm up, we show how to recover this result using our 
results on the Bayes risk. We can view cr^ as the Bayes risk of squared loss L{x,y) = {x — y)^, which of 
course elicits the mean: L{p) = min 2 ,gREp[(a; — y)^] = Ep[(Ep[y] — y)^] = <J^{p). This gives us elici(CT^) < 2 
by Corollary 1, with a matching lower bound by Theorem 2, as the variance is not simply a function of the 
mean. Corollary 1 gives losses such as L{x,v,y) = e~'"{{x — y)^ — v) — which elict {Ep[y],(T^(p)}, but 
in fact there are losses which cannot be represented by the form (2), showing that we do not have a full 
characterization; for example, L{x, v, y) = v'^ + v{x — y){2{x -I- y) -f 1) -I- (a: — y)^ ((x + y)^ -I- x -I- y -I- l) . 

This L was generated via squared loss 
elicits the first two moments, and link function {zi,Z 2 ) ^ {zi,Z 2 — Zi). 

3.1.2 Convex Functions of Means 

Another simple example is 7 (p) = G(Ep[A]) for some strictly convex function G : R and 7^-integrable 

X ■. Q, ^ R^. To avoid degeneracies, we assume dimafFhull{Ep[A] : p G V} = fc, i.e. P is full rank. 
Letting {dGp}p^'p be a selection of subgradients of G, the loss L{r,oj) = —{G{r) + dGr{X{oj) — r)) elicits 
r : p I—>■ Ep[A], and moreover we have j{p) = —L{p)- By Lemma 2, elicx(r) = k. One easily checks that 
L = G o P, so now by Theorem 2, elicx(7) = k as well. Letting {AfcjfegN be a family of such “full rank” 
random variables, this gives us a sequence of real-valued properties 7 fc(p) = ||Ep[A]|p with elicx( 7 fe) = k, 
proving Proposition 4. 


2 — 


with respect to the norm ||z|p = z^ 


1 - 1/2 
- 1/2 1 
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3.1.3 Modal Mass 

With 0 = R consider the property Jisip) = maxj,gRp([x — /3, x-I-/3]), namely, the maximum probability mass 
contained in an interval of width 2j3. Theorem 1 easily shows elicx( 7 / 3 ) < 2, as 7 p(p) = argmax,j,gRp([x — 
/3,x -f /3]) is elicited by L(x,y) = \\x-y\>p^ and 7 / 3 (p) = 1 — Ll{p)- Similarly, in the case of finite ft, j(j)) = 
maX(^gap({w}) is simply the expected score (gain rather than loss) of the mode 7 (p) = argmax^gQp({w}), 
which is elicitable for finite 17. 

In both cases, one can easily check that the level sets of 7 are not convex, so elicx( 7 ) = 2; alternatively 
Theorem 2 applies in the first case. As mentioned following Definition 6 , the result for finite D differs from 
the definitions of Lambert et al. [17], where the elicitation complexity of 7 is jnj — 1. 


3.1.4 Expected Shortfall and Other Spectral Risk Measures 

One important application of our results on the elicitation complexity of the Bayes risk is the elicitability 
of various financial risk measures. One of the most popular financial risk measures is expected shortfall 
ESa : P —>■ R, also called conditional value at risk (CVaR) or average value at risk (AVaR), which we define 
as follows (cf. [19, eq.(18)], [20, eq.(3.21)]): 

ES„(p) = inf {Ep [i(z-y)l^>y - z]} = inf {Ep [i( 2 ;-y)(l^>j,-a)-y]} . (3) 


It was recently shown by Fissler and Ziegel [15] that elicx(ESQ,) < 2. They also consider the broader class 
of spectral risk measures, which can be represented as Pfi(p) = /jq ESa(p)(iy(Q;), where y is a probability 

measure on [0,1] (cf. [19, eq. (36)]). In the case where y has finite support y = for point 

distributions S, Pi > 0, we can rewrite Pp using the above as: 


k 

PuiP) ='^l3^^SaPp) 

i=l 


inf 

zeR'' 




y)(lzi>y 



(4) 


They conclude elicx(y^) < A: -|- 1 unless y({l}) = 1 in which case elicx(p^) = 1. We show how to recover 
these results together with matching lower bounds. It is well-known that the infimum in eq. (4) is attained 
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by any of the k quantiles in qa^(p),..., qa^[p), so we conclude elicx(/3^) < fc + 1 by Theorem 1, and in 
particular the property {p^i,qai, ■ ■ ■ is elicitable. The family of losses from Corollary 1 coincide with 

the characterization of Fissler and Ziegel [15] (see § E.l). For a lower bound, as elicx({(?c(i, ■ • ■ ,qak}) = k 
whenever the ai are distinct by Lemma 3, Theorem 2 gives us elicx(p/j) = k + 1 whenever /j,({l}) < 1, and 
of course elicx(p^) = 1 if = 1- 

3.1.5 Variantile 

The r-expectile, a type of generalized quantile introduced by Newey and Powell [21], is defined as the 
solution X = pr to the equation Ep [llx>!/ ~ TKa; — y)] = 0. (This also shows pr G Ii-) Here we propose 
the T-variantile, an asymmetric variance-like measure with respect to the r-expectile: just as the mean is 
the solution x = p to the equation Ep[x — y] =0, and the variance is (T^(p) = Ep[(/i — y)^], we define the 
T-variantile cr^ by cr^(p) = Ep [|lp^>y - T\{pr - y)^]. 

It is well-known that pr can be expressed as the minimizer of a asymmetric least squares problem: the 
loss L{x, y) = \lx>y — t\{x — y)^ elicits pr [21, 7]. Hence, just as the variance turned out to be a Bayes risk 
for the mean, so is the r-variantile for the r-expectile: 

y,-= argmin Ep [lla;>p - rl(a; - y)^] cr^ = min Ep [|la;>p - rl(a: - y)^] . 

a:eK 

We now see the pair {px^al} is elicitable by Corollary 1, and by Theorem 2 we have elicx(cr^) = 2. 


3.1.6 Deviation and Risk Measures 


Rockafellar and Uryasev [20] introduce “risk quadrangles” in which they relate a risk TZ, deviation V, error 
£, and a statistic <S, all functions from random variables to the reals, as follows: 

TZ{X) =mm{C+ £{X-C)}, V(X) = mm{£{X - C)}, S{X) = aYgmm{£(X - C)} . 

C C" c 

Our results provide tight bounds for many of the risk and deviation measures in their paper. The most 
immediate case is the expectation quadrangle case, where £(X) = E[e(X)] for some e : R —^ R. In this 
case, if 5(X) G Ii('P) Theorem 2 implies elicx(72.) = elicx(71) = 2 provided S is non-constant and e non¬ 
linear. This includes several of their examples, e.g. truncated mean, log-exp, and rate-based. Beyond the 
expectation case, the authors show a Mixing Theorem, where they consider 

k If'" 

^ A,f,(X -C-B,)\Y, = 0 I = min i ^ K£,{X - B[) 

i=l * J [fci 

Once again, if the £i are all of expectation type and Si G Xi, Theorem 1 gives elicx(X>) = elicx(7?-) < fc-|-I, with 
a matching lower bound from Theorem 2 provided the Si are all independent. The Reverting Theorem for a 
pair £i ,£2 can be seen as a special case of the above where one replaces £ 2 {X) by £ 2 {—X). Consequently, we 
have tight bounds for the elicitation complexity of several other examples, including superquantiles (the same 
as spectral risk measures), the quantile-radius quadrangle, and optimized certainty equivalents of Ben-Tal 
and Teboulle [22]. 

Our results offer an explaination for the existence of regression procedures for some of these risk/deviation 
measures. For example, a proceedure called superquantile regression was introduced in Rockafellar et al. [23], 
which computes spectral risk measures. In light of Theorem I, one could interpret their procedure as simply 
performing regression on the k different quantiles as well as the Bayes risk. In fact, our results show that any 
risk/deviation generated by mixing several expectation quadrangles will have a similar procedure, in which 
the variables are simply computed along side the measure of interest. Even more broadly, such regression 
procedures exist for any Bayes risk. 


'D(X) = min min 



4 Conditional Elicitation 

When considering a non-elicitable property F, it is sometimes the case that F would be elicitable if only 
the value of some other elicitable property F' were known. This is the notion of conditional elicit ability^ 
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introduced by Emmer et al. [11], who showed that the variance and expected shortfall are both conditionally 
elicitable, on Ep[y] and qa(j>) respectively. Intuitively, knowing that E is elicitable conditional on an elicitable 
r' would suggest that perhaps the pair {r,r'} is elicitable; Fissler and Ziegel [15] note that it is an open 
question whether this joint elicitability holds in general. We now extend our definitions to this setting, and 
provide insights as to when conditionally elicitable properties are jointly elicitable in the seirse above, with 
both positive and iregative examples. 

Definition 10. Given E' S Sk'(V), we define SkiVlT') to be the set of properties E : P —>• such that 
Ejr' € ffe(E[.) for all r S E'(7^). That is, for any r, E restricted to the set E], is elicitable. As before we 
define Ik{r\T') similarly and £{V\V') = UfceN = UfeeN2:fc(7’|E'). 

Definition 11. We say E is fc-elicitable conditioned on E' if Tfi' is k-elicitable for all r' G V{V), and 
elic(EjE') = minjfc : E is k-elicitable conditioned on E'}. We define elici(EjE') similarly. 

A first observation, also made by Fissler and Ziegel [15], is that any conditionally identihable property is 
jointly identifiable by simply combining the identification functions. 

Lemma 5. Let E' G Ifc'('P) and E G Z'fc('E’lE'). Then {E,E'} G Ifc+fe' fiP). 

A natural question is then, are conditioirally elicitable properties joiirtly elicitable? Our irrain theorem 
in this section is the following result, which gives a necessary condition for identifiability to imply joint elic¬ 
itability. We apply techniques from previous work, and as such need two regularity/smoothness assumptions 
which we detail in the appendix. 

Theorem 3. Let E' G £{V) identified by differentiable V'(r',ui) and E G £{V\T') identified by differentiable 
V(r, r', oj) conditioned on E'. If W(r, r', uj) = {V'ir', ui), V(r, r',cu)} satisfies Assumption 1 and dr'V{r, r',p) 
is not constant onTrC\T'^,, but both dr'VIr',p) and drV{r,r',p) are, the pair \T'is not elicitable by any 
twice differentiable loss L which satisfies Assumption 2. 

This result confirms the non-elicitability of Example 1 of [3], which upon examination is conditionally 
elicitable on a linear property. Theorem 3 also allows us to show in Corollary 2 that the central moment 
Tn{p) = Ep[( 2 / — Ep[?/])”], while elicitable conditioned on the mean Ep[?/], is not jointly elicitable. Of course, 
properties are jointly elicitable with their Bayes risk, and one easily checks that VT is constant within level 
sets in that case. Theorem 3 suggests that in fact these may be the only such properties, which would be 
an intuitive result: Bayes risks for E are “incentive-aligned” with E, allowing us to elicit them together, but 
this may be necessary as well. 

Corollary 2. Let n > 3, and letQ = M. and V such that the first n moments are finite and span R". Then the 
property ..., pink} is not elicitable by any twice differentiable loss which satisfies Assumption 2 

for Hi G {2,..., n} and k < n — 1. In particular, this applies to {/r, pin}- 

By analogy to Lemma 5, one might expect elicitability to be subadditive in the sense that elicx(E) < 
elici(EjE') -I- elici(E'), and in fact this holds with equality for any Bayes risk from Theorem 2: if L elicits 
E' then elici(L) = elici(LjE') -f elici(E'), as we actually have elici(LjE') = 0 when L = f o T'. In light of 
Corollary 2 and the following Proposition, however, we conjecture elici(^ri) = n for all n > 0, which would 
provide a counter-example as eWcxipinlp) = 1 and elici(/r) = 1. If this conjecture were the case, we would 
also have an unbounded gap between iden and elicx. 

Proposition 5. If the central moment is elicitable via E G I 2 {T’), then T = g o {pi, pin}- 


5 Discussion 

We have outlined a theory of elicitation complexity which we believe is the right notion of complexity for 
ERM, and provided techniques and results for upper and lower bounds. In particular, we now have tight 
bounds for the large class of Bayes risks, including several applications of note such as spectral risk measures. 
Our results also offer an explanation for why procedures like superquantile regression are possible, and extend 
this logic to all Bayes risks. There are also a number of natural open problem in elicitation complexity. For 


example, is the elicitation complexity of the nth central moment equal to n? Are there conditionally elicitable 
properties other than Bayes risks which are jointly elicitable? What is the elicitation complexity of the mode 
and other properties whose non-elicitability is known? Finally, the most general open question remains a 
full characterization of elicitable vector-valued properties and the losses eliciting them. 
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A Short Proofs 


Proof of Corollary 1. The only nontrivial part is showing {T, T} G Let V(a,uj) identify T. Then 

V'{{r, a),uj) = {V(a, uj), L{a, ui) — r} identifies {L, T}. □ 

Proof of Lemma 1. Let L : 7^ —and T' : P —> R^ . Unfolding our definitions, we have elici({r, T'}) > 
iden({r,r'}) = iden(r) + iden(r') = k + k'. For the upper bound, we simply take losses L and L' for F and 
F', respectively, and elicit {r,F'} via L{r,r\u}) = L{r,uj) + L'{r,uj). □ 

Proof of Proposition 3. We will simply show how to compute the CDF F of p, using only countably many 
parameters. Let be an enumeration of the rational numbers, and T{F)i = F{qi). We can elicit F 

with the loss L({ri}igN, y) = SigN ~ ^y<qi)^ for 0 < /3 < 1. We now have F at every rational number, 
and by right-continuity of F we can compute F at irrationals. Thus, we can compute F, and then T{F). □ 

Proof of Lemma 2. Let i = dimafFhull(r(P)) and tq G relint(r(7^)). Then V = span{r(p) — r^ : p G V} is a 
vector space of dimension i and basis vi,... ,ve. Let M = [vi ... Vf] € R^^^. Now define V : r(7^) x O —>■ R^ 
by V{r,uj) = M+(X(w) — r). Clearly Ep[X] = r => V(r,p) = 0, and by properties of the pseudoinverse 
M+, as Ep[X] —r G imM, M+(Ep[X] — r) = 0 Ep[X]—r = 0. Thus iden(r) < £. As dimspan({U(r,p) : 
p G V}) = dim V = £, by Lemma 8, iden(r) = £. 

Elicitability follows by letting r'(p) = M+(Ep[Ar]—ro) = Ep[M+(Ai— tq)] G R^ with link f{r') = Mr'+ro] 
F' is of course elicitable as a linear property. □ 

Proof of Lemma 5. Let V'{r',u}) identify F' and U(r,r',a;) identify F conditioned on F'. Let W{r,r',uj) = 
{V'{r',Lu),V{r,r'.uj)}. Then W{r,r',p) = 0 V'{r',p) = 0 A V{r,r',p) = 0 r'(p) = r'A 

U(r, r',p)=0 F'(p) = r'A r(p) = r. □ 

Proof of Corollary 2. We label the components of r from 0 for convenience. Take V(ro,y) = y — ro and 

One easily checks that these functions satisfy the conditions of Theorem 3 for 
F = p and F' = in particular draV = —1 and ^.U' = —Ifc, the kxk identity matrix. 

Thus we need only compute drgV'{r,p) = {'Kp[—ni{y — = {—nifini-i{p)}i=i- By assumption, 

the first n moments span R" on 7^, and one can check that {/i, p 2 , ■. ■, yn} constitute a change of basis and 
hence also span R". Thus, letting N = {l,ni,... ,nk} be the indices for the moments covered by V, the 
only way for droV'{r,p) to be constant for all p G F^ is if — 1 G for all i. But the only way for this to 
hold is iV = {1,..., n}, contradicting fe < n — 1. □ 

B Proof of Lemma 4 

Lemma 6 ([18]). Let G : X eonvex for some convex subset X of a vector space V, and let d G dGx be 
a subgradient of G at x. Then for all x' G X we have 

d G dGx' G{x) — G{x') = d{x — x') . 

Lemma 7. Let G : X ^ M. convex for some convex subset X of a vector space V. Let x^x' G X and 
a;A = Ax -I- (1 — X)x' for some X G (0,1). If there exists some d G dGx), \ [dGx U dGx'), then G[x\) < 
XG[x) + [1 - X)G[x'). 

Proof. By the subgradient inequality for d at x\ we have G[x) — G[x\) > d[x — x\), and furthermore 
Lemma 6 gives us G[x) — G[x\) > d[x — Xa) since otherwise we would have d G dGx. Similarly for x', we 
have G'(x') — G(xa) > d[x' — x\). 

Adding A of the first inequality to (1 — A) of the second gives 

XG[x) + (1 — A)G(x') — G[x\) > Xd[x — xa) + (1 — X)d[x' — x\) 

= A(1 — X)d[x — x') + (1 — X)Xd[x' — x) = 0 , 

where we used linearity of d and the identity xa = x' + A(x — x'). □ 
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Restatement of Lemma 4: Suppose loss L with Bayes risk L elicits F : P —>■ Then for any p,p' gV 

with T{p) ^ r(p'), we have L{Xp + (1 — X)p') > \L{p) + (1 — X)L{p') for all A S (0,1). 

Proof. Let G = —L, which is the expected score function for the (positively-oriented) scoring rule S = —L. 
By Theorem 3.5 Frongillo and Kash [18] , we have some T> C dG and function ip : r('P) —>■ V such 
that r(p) = p~^{'D n dGp). In other words, as our T is a function, there is a subgradient dr = p{r) 
associated to each report value r G r(T’), and dr G dGp <;=> r = r(p). Thus, as we have p,p' G V with 
r = r(p) ^ r(p') = r', we also have dr £ dGp \ dGp> and dr> G dGp> \ dGp. 

By Lemma 7, if r(pA), r(p), and r(p') are all distinct, then we are done. Otherwise, we have r(pA) = r(p) 
without loss of generality, which implies dr G dGp^ by definition of p. Now assume for a contradiction that 
G{p\) = XG{p) -I- (1 — A)G(p'). By Lemma 6 for dr we have G{p) — G{px) = dr{p — p\) = dr{p\ — p'). 
Solving for G{p) and substituting into the previous equation gives (1 — A) times the equation G{p\) = 
dr{p\ — p') + G{p'), and applying Lemma 6 one more gives dr G dGp>, a contradiction. □ 


C Proofs for Conditional Elicitation 


We will need some assumptions about some of the identification functions and losses we will use to guarantee 
they are applicable. The first assumption essentially says that V is sufficiently rich, and is required to reply 
a result due to Fissler and Ziegel [15]. The second assumption is that we can work directly with expected 
losses, and has previously been used by Frongillo and Kash [3]. 

Assumption 1. Let V identify F. For all r G intfTifP)) C there exist pi^... ,pk+i such that 0 G 
int{Conv{{V{r,pi),... ,V{r,pk+i)})). 

Assumption 2. Twice differentiable loss L satisfies ]Ep[VrL(r, w)] = VrIEp[L(r, w)] and similarly for second 
derivatives. 

Proof of Theorem 3. Let T -. V ^ R^, F' : 7^ ^ R^ . Then by the proof of Lemma 5, FF(r, r',w) = 
{V{r',oj),V{r,r',uj)} identifies {F,F'}. Letting L be a twice differentiable loss eliciting {F,F'}. By as¬ 
sumption 1, the conditions of [15, Theorem 3.2] (commonly known as Osband’s principle [5]) apply, and 
we may write the Jacobian of L as VL{r,r',p) = H{r,r')W(r,r',p) where H{r,r') G R(^'+^ )x(fc+fe ) 
W{r,r',p) G R(^'+^ Differentiating L once more (using assumption 2), and letting p £ F^ fl FJ.,, we have 
V‘^L{r,r',p) = X/H{r,r')W{r,r',p) + P[{r,r')yW{r,r',p) = H{r,r')VW{r,r',p), as the first term is zero 
due to identifiability. Now note that we may write 

■ Xir,p) 0 

Y{r,r',p) Z{r,r',p)_ 

Writing iJ in block form [A, B;G,D] we have 


VW{r,r' ,p) = 


drV{r,p) 
dr>V{r, r',p) 


0 

drV(r, r' 


P) 


V'^L{r, r',p) 


A{r,r') B{r,r') 

G{r,r') D{r,r')_ 


X{r,p) 0 

Y{r,P,p) Z{r,r',p) 

A{r,r')X{r,p) + B{r,r')Y{r,r' ,p) B{r,r')Z{r,r' ,p) 

c\r,r')X{r,p) + D{r,r')Y{r,r',p) D{r,r')z\r,r' ,p) 


By assumption, we have some p,p' £ F^ fl F(., such that Y(r,r',p) ^ Y{r,r',p'), but X{r,p) = X{r,p') and 
Z{r,r',p) = Z{r,r',p'). As must be symmetric for both p,p', we must have GX -|- DY = Z^B^ for 
both as well. By strict optimality of (r, r'), we also know that must be positive definite, and thus the 
block diagonal elements are also positive definite, and both D and Z, being square, are of full rank. This 
tells us y = D~^{Z^B^ — GA), which cannot hold for both p and p' as all terms are fixed except Y. □ 


C.l Proof of Proposition 5 

Proof. LetTOfc(p) = Ep[y^j denote the fcth raw moment of p. Then we may write pn(p) = 

Suppose = / o F for F : T* -> R^ identified by V and elicited by a twice differentiable loss L. 
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Fix p' € V, r = r(p'), and consider p G Fr- By assumption we must have Pnip) = P'uip') = /(?'); which 
by convexity of F^ implies Pni^P + (1 “ ^)p') = fip) for all A G [0,1]. Expanding this equation out yields a 
polynomial in A, which by linearity of the mk becomes, 

f{r) = ^(-1)”"'' (K + A(mfe - m'k)){m[ + A(mi - , (5) 

where we write mk = mk{p) and = mk{p'). For this polynomial to be constant in A on [0,1], the 
coefficients of nonzero powers of A must be zero. Expanding the first few terms of eq. (5) gives, 

/(r) = (-1)" (mo + A(mo - mQ)){m[ + A(mi - m'l))" 

+ (-!)”■' Q K + Hmi - m[)){m[ + A(mi - m'J^-^ + 0(A"-i) 

= (-l)"'“^(n - l){m[ + A(mi - + 0(A”“^) , 

where we used mg = m'^ = 1 and 0(A”“^) encompasses the remaining terms. Now, in particular, the 
coefficient of A" must be zero, which gives us (—l)”“^(n — l)(mi — m[) = 0, yielding mi = m[. As mi(p) 
is constant for all p G Ff, we have some ci(r) such that 

Epb - ci(r)] = 0 . (6) 

Now the constraint that /in(p) = f{r) may be simplified to 

fir) = pnip) = Ep[( 2 / - ]Ep[y])”] = Ep[{y - ci(r-))’"] . (7) 

Thus, by Lemma 9, F G I 2 is identified by V{r,y) = {y — ci{r),{y — ci(r))"’ — f{r)}. Letting g{r) = 

{ci(r-),/(r)}, we can see that g must be invertible: if g{r) = g{r') for r ^ r', then as V identifies F but 
depends on r only through q, we have Fj. = {p G "P : E(r,») = 0| = F^', a contradiction. We conclude 
F = g~^ oF' for F'(p) = {mi(p),p„(p)}. □ 

D Identification Lower Bounds 

Lemma 8. Let F G I{V) he given, and suppose for some r G r(P) there exists E : fl —> with Ep[E] = 0 
for all p G Fr. If span({Ep[y] : p G V}) = R.^ and some p G F^ can be written p = Xp' + (1 — A)p" where 
p',p" Fj., then iden(r) > k. 

Proof. The proof proceeds in two parts. First, we show that the conditions regarding V suffice to show 
that codim(span(Fr)) > fc in span(P). Second, we show that this means (any flat subset of) F^ cannot be 
identified by a IT : span(P) -G R^ for £ < k. 

Let V and r as in the statment of the lemma be given. By definition, codim(span(Fj.)) = dim(span(P)/span(rr)), 
where S'i/S '2 is the quotient space of by 82 - Let Trr,, : span(P) —span(P)/span(rr) denote the projection 
from span(P) to its quotient by span(Fj.). By the universal property of quotient spaces, there is a unique Ty ■ 
span('P)/span(Fj.) —R^ such that V = Ty o TTp,.. By the rank nullity theorem, dim(span(P)/span(rr)) = 
dim(ker(Tv)) + dim(im(Ti/)). By assumption dim(im(ri/)) = dim(im(E)) = fc, so codim(span(Fj.)) > k. 

Now assume for contradictiont that F = / o F, with F G with £ < k. Let r' denote the level set 

such that p G Fr-'. Since F^' C F^, codim(r^/) > codim(Fr) > k. Let W : span(P) —>■ R^ identify F^/. 

As before, there is a unique Tw '■ span(P)/span(rr') —>■ R^ such that W = Tw ° ’’"f ,• By the rank nullity 

theorem, dim(span(P)/span(fr')) = dim(ker(Tw)) + dim(im(Tvi^)). Thus dim(ker(Tw)) > k — £ > 0. To 
complete the proof we need to show that this means there is a g G P — F^/ such that TTp ^ (g) G ker(Tvi/). 

To this end, let q” G span(P). Then q" = J2i with qi G V for all i. Thus irp ^ (g") = np ^ (^- A^g^) = 
X^iAiTTp ^(qi) and so span(P)/span(r,./) = span({7rp ^ (g') | g' G P}). Since p = Xp' + (1 — A)p" where 
p',p" ^ Fj., TTp (p) = 0 is not an extreme point of the convex set {TTp (g') | q' G P}. Since dim(ker(rw)) > 0, 
this means there exists g G P — F^' such that TTp ^ (g) G ker(Tvi^). This contradicts the assumption that W 
identifies F^', completing the proof. □ 
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Lemma 9. Let V : T{V) x —>■ identify F, and suppose for all r G relint(r(7^)) there exists p,p' G V and 
X G (0,1) such that r = r(Ap+(l —A)p') ^ r(p) and span({Ep[y(r, w)] : p GV}) = R^. Let m : r('P) x R 
be given. If for all r G L{V) we have r(p) = r Ep[M(r,a;)] = 0 and Ep[u(r, w)] ^ 0 for some p G V, 

then there exists V : T{'P) x LI ^ R^ identifying F with V( = u. 


Proof. Fix r G relint(F(7^)). As in Lemma 8 we will treat functions / : —s- R^ as linear maps from span(7^) 

to R^, so that im/ = {Ep[/] : p G span(P)}. 

Let U ■. LI ^ R*^+^ be given by C/(w) = {V(r,uj),u(r,uj)}. If we have imC/ = R^+^, then Lemma 8 gives 
us a contradiction with •) : LI -G R^. Thus dimimC/ = k, and there exists some a G R^"*" \a 7 ^ 0 , 
such that a^U = 0 on span(T’). As dimimF( 7 ’, •) = fc, we cannot have afc+i = 0, and as u 7 ^ 0 on 
span (7^), we must have some i ^ k + 1 with Oi 7 ^ 0. Taking at = —1 without loss of generality, we 
have Vi = Ui = span(T’). Taking V'{r, •) = •)} U {Vj(r, we have for all p G V, 


Ep[F'(r,cu)]=0 


Vj 7 ^ i Ep[[/j] = 0 


Ep[C/] = 0 


Ep[y(r,a;)]=0. 


□ 


E Other Omitted Material 

E.l Losses for Expected Shortfall 

Corollary 1 gives us a large family of losses eliciting {ESa,(7a} (see footnote 7). Letting L{a,y) = ^(a — 

y)'la>y — a, we have ESa(p) = infagRL(a,p) = L(p). Thus may take 

L{{r, a), y) = L'{a, y) + H{r) + h{r){L{a, y) - r) , (8) 

where h{r) is positive and decreasing, H{r) = h{x)dx, and L'{a,y) is any other loss for qa, the full 

characterization of which is given in Gneiting [7, Theorem 9]: 

L'{a, y) = {la>y - a){f{a) - f{y)) + g{y) , (9) 

where is / : R —> R is nondecreasing and g is an arbitrary 7^-integrable function.® Hence, losses of the 
following form suffice: 

H{r,a),y) = (la>y - a){f{a) - f{y)) + ^h{r)la>y{a - y) - h{r){a + r) + H{r)+g{y) . 

Comparing our L{(r,a),y) to the characterization given by Eissler and Ziegel [15, Cor. 5.5], we see that we 
recover all possible scores for this case (at least when restricting to V which ). Note however that due to a 
differing convention in the sign of ES^, their loss is given by L{{—xi,X 2 ),y)- 


®Note that Gneiting [7] assumes L(x,y) > 0, L{x,x) = 0, L is continuous in x, dLjdx exists and is continuous in x when 
y ^ x\ we add g because we do not normalize. 
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